Learning useful information across long time lags is a critical and difficultproblem for temporal neural models in tasks such as language modeling. Existingarchitectures that address the issue are often complex and costly to train. TheDifferential State Framework (DSF) is a simple and high-performing design thatunifies previously introduced gated neural models. DSF models maintainlonger-term memory by learning to interpolate between a fast-changingdata-driven representation and a slowly changing, implicitly stable state. Thisrequires hardly any more parameters than a classical, simple recurrent network.Within the DSF framework, a new architecture is presented, the Delta-RNN. Inlanguage modeling at the word and character levels, the Delta-RNN outperformspopular complex architectures, such as the Long Short Term Memory (LSTM) andthe Gated Recurrent Unit (GRU), and, when regularized, performs comparably toseveral state-of-the-art baselines. At the subword level, the Delta-RNN'sperformance is comparable to that of complex gated architectures.
展开▼